A New Agglomerative Hierarchical Clustering Algorithm Implementation based on the Map Reduce Framework
نویسندگان
چکیده
Text clustering is one of the difficult and hot research fields in the text mining research. Combing Map Reduce framework and the neuron initialization method of VPSOM (vector pressing SelfOrganizing Model) algorithm, a new text clustering algorithm is presented. It divides the large text vector dataset into data blocks, each of which then processed in different distributed data node of Map Reduce framework with agglomerative hierarchical clustering algorithm. The experiment results indicate that the improved algorithm has a higher efficiency and a better accuracy.
منابع مشابه
Fuzzy Agglomerative Clustering
In this paper, we describe fuzzy agglomerative clustering, a brand new fuzzy clustering algorithm. The basic idea of the proposed algorithm is based on the well-known hierarchical clustering methods. To achieve the soft or fuzzy output of the hierarchical clustering, we combine the single-linkage and completelinkage strategy together with a fuzzy distance. As the algorithm was created recently,...
متن کاملMethods of Hierarchical Clustering
We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self-organizing maps, and mixture models. We review grid-based clustering, focusing on hierarchical density-based approaches. Finally we describe a recently developed very efficient (linear time) hierarchical clustering...
متن کاملImplementation of Hybrid Clustering Algorithm with Enhanced K-Means and Hierarchal Clustering
We are propose a hybrid clustering method, the methodology combines the strengths of both partitioning and agglomerative clustering methods. Clustering algorithms that build meaningful hierarchies out of large document collections are ideal tools for their interactive visualization and exploration as they provide data-views that are consistent, predictable, and at different levels of granularit...
متن کاملEnhancing Map-Reduce Framework for Bigdata with Hierarchical Clustering
MapReduce is a software framework that allows certain kinds of parallelizable or distributable problems involving large data sets to be solved using computing clusters. This paper introduces our experience of grouping internet users by mining a huge volume of web access log of up to 500 gigabytes. The application is realized using hierarchical clustering algorithms with Map-Reduce, a parallel p...
متن کاملClustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering
Agglomerative hierarchical clustering becomes infeasible when applied to large datasets due to its O(N2) storage requirements. We present a multi-stage agglomerative hierarchical clustering (MAHC) approach aimed at large datasets of speech segments. The algorithm is based on an iterative divide-and-conquer strategy. The data is first split into independent subsets, each of which is clustered se...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JDCTA
دوره 4 شماره
صفحات -
تاریخ انتشار 2010